A Representation Scheme for Finite Length Strings

نویسندگان

  • Gaurav Tandon
  • Debasis Mitra
چکیده

This study is an attempt to create a canonical representation scheme for finite length strings to simplify the study of the theory behind different classes of patterns and to ease the understanding of the underlying separability issues. This could then be used to determine what kinds of techniques are suitable for what class of separability (linear, multi-linear or non-linear). This representation can then be used in intrusion detection, biological sequences, pattern recognition/classification and numerous other applications. INTRODUCTION Various approaches have been proposed to detect intrusions based upon detecting deviations from normal host based traffic. Host based systems attempt to detect anomalies based upon system call information, an idea first suggested by Forrest et al [8]. Sequences of system calls are used to determine if an intrusion has taken place or not. This sequence of system calls is nothing but a sequence of finite length strings. This study is an attempt to define an abstract representation for the strings representing the various system calls of a host based anomaly detection system. Such a representation can also be used in other areas where such sequences of strings come into the picture, examples being the study of DNA sequences and pattern recognition. The motivation for such a representation is to use it to study and understand the theory behind different classes of patterns and the underlying separability issues easily. With such a canonical representation, we would be able to determine what kinds of techniques are suitable for what class of separability linear, multi-linear or non-linear. The simplest case of a linearly separable decision problem is one consisting of two sets of points (patterns) in a 2-D vector space that belong to different classes, where the two classes can be separated by a straight line. In 3 dimensions the equivalent problem is one where the points are separable by a plane. We can extend this idea to any number of dimensions, though 4 or 5 dimensions are harder to visualize. Planes and straight lines are just hyperplanes of dimension 1 and 2. There are hyperplanes of higher dimension. In higher dimensional cases two classes in an N dimensional space are termed as linearly separable if they are separable by a hyperplane of dimension N-1. The strings in our representation scheme could represent the various system calls of a host based anomaly (intrusion) detection system. Such a representation can also be used in other areas where such sequences of strings come into the picture, examples being the study of DNA sequences and pattern recognition. RELATED WORK There is no work in this particular area available to our knowledge that we can cite here. But there are some works that we studied which could assist us in our representation scheme. These works are described next. Linear Separability of Sets Two sets X1 and X2 in Euclidean space E , which means each element x in those sets is represented as an ntuple: x = ( E1, E2, ..., En ) . The hyperplane which seperates the two sets is represented as a nonzero vector in E. This is the normal to the separating plane: a = ( p1, p2, ..., pn, p ) . The following algorithm [2] is a recursive solution to find the vector a, the normal to the hyperplane separating the two sets X1 and X2. It is defined as the function a = A( s, P ) taking two parameters: 1. s, the number of elements in the subset Ys of Y: Ys = { y1, y2, ..., ys } 2. P, a subspace of Euclidean space E of dimension k+1. The algorithm also assumes that the two sets X1 and X2 are not existing in the same hyperplane; if this is the case, then Y = X1 X2 contains a basis of E . The algorithm (from [6]) is as follows: 1. Find a subset B of Y which, when unioned with P , demonstrates linear closure under E: B = { yr0, yr1, ..., yrk } and B1 = { yr1, yr2, ..., yrk } 2. Find a vector x such that: x exists in P B1 and x · yr0 > 0 3. If k = 0, then x defined by step 2 is returned as the solution a Otherwise, we define the sequence a0 = x, a1, a2, ..., as = a 4. Set a0 := x 5. Iterate from i = 1...s the recursive step of the algorithm: if ai-1 · Yi 0, then ai := ai-1 else ai := A( i-1, P { yi } ) The algorithm will terminate in step 3, returning the vector a as the normal of the hyperplane separating X1 and X2. Threshold Logic Unit In a Threshold Logic Unit (TLU) [9], the output of the unit in response to a particular input pattern is calculated in two stages. First the activation is determined. The activation is passed through a threshold function to obtain the output. Learning in a TLU is concerned with using an automatic procedure for adjusting the weights and threshold so that the decision boundary minimizes the error function. A simple learning procedure for a TLU is called the perceptron learning rule. The error function used for the perceptron training rule is based on the number of incorrectly classified points in the training set. The perceptron learning procedure minimizes this error function. The Least Mean Squares rule minimizes an error measure called the mean squared error. In the TLU implementation of this rule the error for each pattern is calculated by finding the difference between the desired and actual activation, i.e. the error is calculated using the activation of the TLU before it is passed through the threshold function and weights are updated. If the weights are updated after every input pattern then the decision boundary will continue to move around the point with the lowest error. Support Vector Machines Support vector machine [4] finds a nonlinear decision function in the input space by mapping the data into a higher dimensional feature space and separating it there by means of a maximum margin hyperplane. The computational complexity of the classification operation does not depend on the dimensionality of the feature space, which can even be infinite. Overfitting is avoided by controlling the margin. The separating hyperplane is represented sparsely as a linear combination of points. The system automatically identifies a subset of informative points and uses them to represent the solution. Finally, the training algorithm solves a simple convex optimization problem. All these features make Support Vector Machines an attractive classification system. Convex Hull Methods Another strategy used to separate a set of points from another set of points on a plane is to compute the convex hull of the two sets of points [5]. If the convex hulls overlap, then the given sets are not separable. Quite a few strategies are known for convex hull. These include Graham’s scan [7], Jarvis’ March [1] and Quick Hull [3]. Convex hulls in two and even three dimensions are fairly easy to work with. However, as the dimension of a space increases, certain assumptions that were valid in lower dimensions break down. For example, any n-vertex polygon in two dimensions has exactly n edges. However, the relationship between the numbers of faces and vertices is more complicated even in three dimensions. OUR EXERCISE The idea is to create a lattice based structure which can be used to create an abstract representation so that we can differentiate between “normal” and “abnormal” sequences of system calls, that is, two different sets of strings. The system calls are finite in number. Let the various system calls be represented by a unique natural number, which is 1, 2, 3, and so on. Our goal is to define an abstract representation of these system calls (which are inherently strings). We would like to start with a simple model consisting of 3 system calls and then build upon the same, adding more complexity and venturing into higher dimensions, as we will explain later. Let system call s1 be represented by 1, system call s2 be represented by 2, and system call s3 be represented by 3. There are a total of 3! = 6 combinations corresponding to these 3 numbers (representing strings/system calls), namely 1 2 3, 1 3 2, 2 1 3, 3 1 2, 3 2 1 and 2 3 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Finite Zitterbewegung Model for Relativistic Quantum Mechanics*

-. Starting from steps of length h/me and time intervals h/mc2, which imply a quasi-local Zitterbewe.quny with velocity steps fc, we employ discrimination between bit-strings of finite length to construct a necessarily 3+1 dimensional eventspace for relativistic quantum mechanics. By using the combinatorial hierarchy to label the strings, we provide a successful start on constructing the coupli...

متن کامل

استفاده از دستگاه مختصات متعامد محلی در مدل کردن ترک دو بعدی به روش المان محدود توسعه یافته

The extended finite element method (X-FEM) is a numerical method for modeling discontinuties, such as cracks, within the standard finite element framework. In X-FEM, special functions are added to the finite element approximation. For crack modeling in linear elasticity, appropriate functions are used for modeling discontinuties along the crack length and simulating the singularity in the crack...

متن کامل

On the equivalence between minimal sufficient statistics, minimal typical models and initial segments of the Halting sequence

It is shown that the length of the algorithmic minimal sufficient statistic of a binary string x, either in a representation of a finite set, computable semimeasure, or a computable function, has a length larger than the computational depth of x, and can solve the Halting problem for all programs with length shorter than the m-depth of x. It is also shown that there are strings for which the al...

متن کامل

Approximation of fixed points for a continuous representation of nonexpansive mappings in Hilbert spaces

This paper introduces an implicit scheme for a   continuous representation of nonexpansive mappings on a closed convex subset of a Hilbert space with respect to a   sequence of invariant means defined on an appropriate space of bounded, continuous real valued functions of the semigroup.   The main result is to    prove the strong convergence of the proposed implicit scheme to the unique solutio...

متن کامل

A Composite Finite Difference Scheme for Subsonic Transonic Flows (RESEARCH NOTE).

This paper presents a simple and computationally-efficient algorithm for solving steady two-dimensional subsonic and transonic compressible flow over an airfoil. This work uses an interactive viscous-inviscid solution by incorporating the viscous effects in a thin shear-layer. Boundary-layer approximation reduces the Navier-Stokes equations to a parabolic set of coupled, non-linear partial diff...

متن کامل

A Fast and Accurate Global Maximum Power Point Tracking Method for Solar Strings under Partial Shading Conditions

This paper presents a model-based approach for the global maximum power point (GMPP) tracking of solar strings under partial shading conditions. In the proposed method, the GMPP voltage is estimated without any need to solve numerically the implicit and nonlinear equations of the photovoltaic (PV) string model. In contrast to the existing methods in which first the locations of all the local pe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003